Protein mechanical unfolding: a model with binary variables 
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A simple lattice model, recently introduced as a generalization of the Wako-Saito model of protein folding, is 
used to investigate the properties of widely studied molecules under external forces. The equilibrium properties 
of the model proteins, together with their energy landscape, are studied on the basis of the exact solution of the 
model. Afterwards, the kinetic response of the molecules to a force is considered, discussing both force clamp 
and dynamic loading protocols and showing that theoretical expectations are verified. The kinetic parameters 
characterizing the protein unfolding are evaluated by using computer simulations and agree nicely with experi- 
mental results, when these are available. Finally, the extended Jarzynski equality is exploited to investigate the 
possibility of reconstructing the free energy landscape of proteins with pulling experiments. 

PACS numbers: 87.15.Aa, 87.15.He, 87.15.-v 



I. INTRODUCTION 

The three-dimensional structure of proteins is strictly con- 
nected to the biological functions these molecules perform 
in living cells \\\\. Among various experimental techniques, 
an increasingly important role in the study of protein struc- 
tures is being played by single-molecule force spectroscopy, 
where proteins Q HI S HI S (but also nucleic acid frag- 
ments l'9'l) are pulled by applying controlled forces to their 
ends through an atomic force microscope (AFM) or optical 
tweezers. 

By studying the dynamical response of proteins to constant 
or varying loading, much information on their structure has 
been gathered 1^13113, HI @]. In particular, the possibility of 
controlling the applied force with high precision has allowed 
to trace the molecule folding and unfolding pathways iHHTlHt]- 
Nevertheless, as the size of the molecules increases, force 
spectroscopy outcomes cannot be easily related to molecular 
properties. Thus theoretical models of biomolecules subject to 
an external force have been developed lflol[Tll[T2l[T3ll . which 
represent important tools to study the interplay between pro- 
tein response to external force and molecular structure. 

In most cases, these models are based on a coarse-grained 
description of the biomolecule, their dynamical degrees of 
freedom being related to the coordinates and velocities of a 
suitable set of reference beads, typically one or a few per 
amino acid ifTol [TTl [T2I [isll . Equilibrium and nonequilibrium 
results are then obtained by means of (time expensive) com- 
puter simulations. 

In a recent paper [14] we have approached the problem of 
protein unzipping from a different point of view, introducing 
a simple lattice model with binary degrees of freedom, based 
on (and generalizing) the Wako-Saito (WS) model of protein 
folding [15, 16]. Despite its simplicity, the model turned out to 
exhibit the typical response of real proteins to pulling. In par- 
ticular, the mechanical unfolding of the 27th immunoglobulin 



domain of titin was investigated, considering both force clamp 
and dynamic loading protocols. Theoretically expected laws 
were verified and an excellent agreement with the experimen- 
tal values of the characteristic kinetic parameter was found. 
The model was also used to investigate the possibility of re- 
constructing the protein free energy landscape b y explo iting 
an extended version of the Jarzynski equality (JE) iuTlllsifll. 

The present paper has several purposes. On the method- 
ological side, we give a full derivation of the relation between 
our model and the original WS model, and show in detail how 
to obtain free energy landscapes. On the application side, we 
consider three other molecules (including BBL, whose ther- 
mal behaviour has recently been the subject of some debate, 
see ref. flB] and references therein, and ref. fl^) in addition 
to titin and, together with the properties already investigated 
in lfl4ll . we discuss also the probability distribution of unfold- 
ing times and forces. 

The article is organized as follows. In Section Ull we de- 
scribe the details of our model, its connection with the WS 
model, and the simulation method. In Section |lll] we present 
the equilibrium properties of the model proteins, as functions 
of force and temperature. We then present the results of me- 
chanical unfolding obtained by numerical simulations, first, in 
Section lTVl for the force clamp manipulation and then, in Sec- 
tion|Vl for the dynamic loading. In Section lVll the free energy 
landscape of the model proteins is reconstructed from unfold- 
ing manipulations, by using an extended Jarzynski Equality. 
Conclusions are drawn, and future developments are sketched, 
in SectionlVn] 
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II. THE MODEL 

In the present section, we define our model and show that 
in the absence of an external force it reduces to the WS model 
of protein folding. The latter was introduced in 1978 by Wako 
and Saito, in two papers [15., 16.1 that appear to have been for- 
gotten until recent years. The same model was independently 
reintroduced by Munoz, Eaton and coworkers at the end of the 
'90s [22, 23, 24[. These authors used the model to describe 
and interpret experimental resultSj_and soon the model became 
quitepopular[|2l|2i|2l,|M|2i[33,[3l[3l[3l[M[3l. This 



is why it is often referred to as the Munoz-Eaton, or Wako- 
Saito-Munoz-Eaton, model. The first recent reference to the 
original work by Wako and Saito appeared, as far as we know, 
in 1I27I1 . 

Similarly to the WS model, the state of a protein is defined 
according to the conformation of its peptide bond backbone, 
and in order to reduce the degrees of freedom, we assume that 
the peptide bonds can exist only in two conformations: native 
and non-native. Thus a 1 amino acid protein is represented 
as a chain of peptide bonds, and a binary variable is 
associated to each peptide bond. Bonds are numbered from 
1 to and amino acids from 1 to + 1, bond k connecting 
amino acids k and k + I. The variable takes the value 0, 1 
corresponding to a peptide bond in unfolded or native state 
respectively. 

In order to couple the molecule to an external force we re- 
gard stretches of consecutive native bonds (delimited by un- 
folded bonds) as rigid portions of the protein with their own 
(native) end-to-end length: in the following /,j will indicate 
the end-to-end length of the stretch of consecutive amino acids 
connected by native bonds and delimited by unfolded bonds 
in positions / and j > i: mi - mj = 0, nik = I for i < k < j; if 
we define the quantity 

7-1 

S ij(m) = (1 - m,)(l - Mj) ]~~[ m^, (1) 

k=i+l 

this condition can be expressed in a more compact form: 
S ij{m) = 1. In the limiting case j - i - 1 the stretch re- 
duces to amino acid j = / -H 1, which is characterized by its 
native length In the following, "stretch" will refer to 

stretches of any length, including single amino acids. Bound- 
ary conditions mo - itin+i = are introduced to define the 
stretches at the protein ends. It is easy to verify that the num- 
ber of stretches is equal to 1 plus the number of O's in the set 
{nik, k^ 1,...N]. 

We want to keep our model as simple as possible, there- 
fore only two orientations of rigid stretches are considered: 
given the direction of the external force, a stretch can only be 
oriented parallel or antiparallel to the force direction. Thus, 
following we introduce the variables cr,y = +1 which de- 
scribe the orientation of the stretch with respect to the external 
force /. Therefore, the configuration of the molecule is fixed 
by the set of {{nik},{o'ij]) values, and for each configuration 
the protein end-to-end length is given by 

L({mk} Ao-ij}) ^ ^ lijO-ijS ijini). (2) 

0<;<j<A'+l 

A cartoon of the model protein is plotted in figH] It is worth 
noting that a given bond configuration {nik] dynamically fixes 
the set of variables {cr,y}: for each configuration [mk] only 
those variables {cr,y) such that 5, ^(m) = 1 are considered. 
As in the original WS model m S HI El B, 

in the 

present model two amino acids / and j + \ > i interact only 
if all the peptide bonds connecting them along the chain (that 
is, bonds from / to j) are in the native state, and if they are 
close enough in the native configuration. The Hamiltonian of 
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FIG. 1: Cartoon of the model protein, with a force applied to one 
of the free ends. Dots denote amino acids and dashed lines denote 
contacts. 

the model reads thus 

N-l N j 
1=1 ;=f+l k=i 

(3) 

The quantity e,j > represents the interaction en ergy be- 
tween the amino acids i and 7 + 1, (defined as in ll24ll30l[34ll . 
see below for details), while A,j = 0, 1 is the corresponding el- 
ement of the contact matrix, taking the value 1 if the distance 
between any two atoms of the two amino acids is smaller than 
some threshold distance, or the value if none of all the pos- 
sible atom pairs satisfies this condition, see discussion below. 

The microscopic degrees of freedom cr,^ do not interact 
among themselves, hence the partial partition sum over them 
can be performed analytically. We obtain the effective Hamil- 
tonian '7/eff , defined by 

2 exp [-/mm] , IcTij], /)] = exp [-frHMlmk} , /)] , 
that reads 

N-l N j 
/= 1 ;■=/■+ 1 *:=/ 

-kBT Yj In [2 cosh (/?//,-,)] 5, v(m). (4) 

0</<j<A'+I 

This effective Hamiltonian is a linear combination of products 
of consecutive nik's (including the single peptide bond) and 
therefore it has the same mathematical structure of the WS 
model, whose Hamiltonian reads 

N-l N j N 

•Wo((mt}) = - 2 Z ^'i^'i n '"'^ ~ Z ~ 

/=1 j=i+\ k=i 1=1 

where qj is the entropic cost of ordering bond /. Given this 
similarity, the partition function associated to the Hamiltonian 
(|4|i can be summed exactly, and the thermodynamic quantities 
can be obtained, as discussed in ref. |30i |3lj]. In the case 
f - 0, "Heff reduces to (up to an additive constant) 

N-l N j N 

'n.ffiimk} ' 0) = - 2 Z ^<j^'j n ""^ ~ ln2 ^(1 - nii), 

1=1 7=1+1 k=i 1=1 
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which corresponds to the WS model, eq.©, with qi - In 2. In 
the absence of force, our new orientational degrees of freedom 
correspond to an entropic gain qi associated to the unfolded 
peptide bonds. This quantity takes the value In 2 because we 
have made the simplest choice of two possible orientations 
per stretch. Of course this is a very rough discretization of 
the actual orientational degrees of freedom of the main chain, 
namely the dihedral angles. In principle, one could think to 
orientational variables taking values in a larger set, and this 
would yield different, even non-uniform, ^, values. 

As discussed in ref. ifTill , in the present model the quanti- 
ties Ay , eij and Uj are chosen by analyzing the native structure 
of a given molecule taken by the Protein Data Bank (pdb in 
the following, http://www.pdb.org/). We briefly review here 
the choice criteria for readability's sake. Two amino acids / 
and 7+1 (with j + \ > i + 2) are in contact (A,j = 1) if, in 
the native state of the protein, at least two atoms from these 
amino acids are closer than 4 A. In this case e,j is taken to be 
equal to ke, where k is an integer such that 5(k- 1) < < 5k, 
and Hat is the number of atoms of the two amino acids whose 
distance is not larger than the threshold distance. The quantity 
e is the protein energy scale, and is determined by imposing 
that, at zero force and at the experimental denaturation tem- 
perature r,„, the fraction of folded molecules is p{T,„) - 1/2. 
In order to estimate p we introduce the number of native pep- 
tide bonds M - Ylk=\ and its average density m - {M)/N, 
which in the following will be used as an order parameter 
The fraction of folded molecules is then estimated, assum- 
ing a two-state picture, as p - (m - mai)/(mo - OTkj), where 
'Woo = 1 /3 is the value of m at infinite temperature, while mo 
is a good representative of the folded state lf34ll . For most 
molecules we can take mo - m{T = 0) = 1, an exception be- 
ing the WW domain of PINl (pdb code 1I6C), which orders 
perfectly only at zero temperature and exhibits a wide plateau 
in m(T) in the range 200-300 K ll34ll . In this case we choose 
mo = m(T = 273K) < 1. 

As far as the parameters Ijj are concerned, the generic 
amino acid / is represented by its A^, - Caj - C, sequence. Tak- 
ing the native state as the reference configuration, /,j is chosen 
as the distance between the midpoint of the C, and A^,+i atoms 
and the midpoint of the Cj and A^^+i atoms. 

In the present paper we shall consider four different 
molecules of increasing size: IBBL (37 amino acids, see lf34ll 
for details), 1I6C (39 amino acids), ICOA (64 amino acids), 
ITIT (89 amino acids). Here and in the following the proteins 
are indicated with their pdb code. Some results on the me- 
chanical unfolding of ITIT and a few about 1I6C have already 
been reported in il4ll and wiU be recalled here for comparison 
with the other molecules. Results on the thermal unfolding 
of the other molecules, based on the WS model, have been 
reported in [30] (ICOA) and |34] (IBBL and 1I6C). In par- 
ticular, in 1 34] it was shown that IBBL differs from a clear 
two-state behaviour, though not being a true downhill folder 
as some authors have claimed. In the following we shall see 
that signals of the peculiar behaviour of IBBL appear also in 
the mechanical unfolding. 

While the equilibrium properties of the present model can 
be calculated exactly (see next section), results on the unfold- 



ing kinetics will be instead obtained by performing Monte 
Carlo (MC) simulations with Metropolis algorithm, using the 
effective Hamiltonian ([3]l. In the following to will indicate the 
system time scale, corresponding to a single MC step. 

III. EQUILIBRIUM PROPERTIES 

Once the temperature and the external force have been 
fixed, the macroscopic state of the system is defined by the 
order parameter m and by the molecule length L. 

In figure |2] the equilibrium values of such quantities are 
plotted as functions of the force, at room temperature, for the 
different molecules here considered. Inspection of these fig- 
ures suggests that the two larger molecules exhibit a sharp 
transition to the unfolded state, while for the two smaller 
molecules the unfolding is more gradual as the force is in- 
creased. The small force plateaux correspond to a global re- 
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FIG. 2: Panel (a): average order parameter m as a function of the ex- 
ternal force /, for different molecules: ITIT (fuUline), ICOA (dotted 
line), 1I6C (dashed line), IBBL (dashed-dotted line). The tempera- 
ture value is taken to be T = 300 K. Panel (b): Root mean square 
length of the same molecules as a function of the external force /, 
with T = 300 K. 

orientation of the molecule, which remains in its native state, 
under the external force. When the force increases a more 
or less sharp transition to an elongated state occurs. As in 
the thermal unfolding study L34i] we see that IBBL exhibits 
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a different, more gradual transition with respect to the other 
molecules, which exhibit a clearer two-state behaviour 

For each molecule, we can obtain a phase diagram by 
computing the locus of points in the temperature-force plane 
where the fraction of folded molecules p - 1/2. Such curves 
are plotted in Fig.|3] inspection of this figure indicates, again, 
that IBBL differs qualitatively from the other molecules. 
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FIG. 3: Phase diagram of ITIT (full line), ICOA (dotted line), 1I6C 
(dashed line), IBBL (dashed-dotted line). For each molecule, the 
curves are defined by p(f, T) = 1/2. The lower-left region of the 
phase diagram corresponds to folded molecules, the upper-right re- 
gion corresponds to unfolded molecules. 

In order to characterize the macroscopic state of the 
molecules with the observable quantity L, let us define the 
constrained zero-force partition function Zo(L) as follows: 



Zo(L)= Yj S{L-L{{r. 



(mt),{cr,y}))e 
and the corresponding free energy by 



-/W({«i,.),|o-i,),/=0) 



Fo(L) = -^BrinZo(L). 



(6) 



(7) 



In appendix lAl we show how Zo(L), as defined by eq. can 
be calculated exactly for our model. In figure Ua), the equi- 
librium free energy landscape of the four molecules here con- 
sidered is plotted as a function of the molecule length, for T = 
300 K. Inspection of this figure indicates that all the molecules 
have a minimum of the free energy at small L, the value of 
the minimum being compatible with the value of -^JiL?) at 
zero force, as plotted in fig.|3b). At larger force, the free en- 
ergy function (|7]i is tilted and reads F{L,f) - Fo(L) - f ■ L: 
this function exhibits new minima fo r eac h molecule, which 
are compatible with the values of V(^^ the large force 
regime, see fig.Efb). As an example, let us consider the ITIT 
molecule. In fig.Hb), the function F{L,f), with / = 10 pN, 
exhibits a global minimum at L ^ 225 A, which corresponds 
to the equilibrium value of -\j{L?) for the same molecule at 
/ = 10 pN, as shown in fig.|2jb). 



The function F{L, f) exhibits an energy barrier at small L, 
whose height AF and width AL depend on the value of /. In 
order to give an example of the typical values of these quanti- 
ties, let us define /1/2 as the force where the molecular length 
is half of its maximum value. 
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FIG. 4: Panel (a): Free energy landscape as a function of the molec- 
ular elongation L, at T = 300 K, for difi'erent molecules: ITIT (full 
line), ICOA (dotted line), 1I6C (dashed line), IBBL (dashed-dotted 
line). Panel (b): Tilted free energy landscapes F(L) - f ■ L, with 
/= lOpN. 



In table U we list the width AL and the height AF of 
the energy baiTier separating the two minima of the function 
F{L) -f-L, for / = /1/2 and T = 300 K. 



Molecule 


AL (A) 


mkBT) 


IBBL 


5 


2.7 


1I6C 


10 


1.6 


ICOA 


25 


5.7 


ITIT 


10 


12.7 



TABLE I: Width AL and height AF of the energy barrier separating 
the two minima of the free energy F(L) = Fo(L)-f-L, with / = /1/2, 
T = 300 K, for the molecules here considered. 
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IV. FORCE CLAMP 

In this section, we investigate the unfolding of the model 
proteins by application of a constant external force: such a 
manipulation scheme is usually called "force clamp". The 
force on the molecule is suddenly increased from to its final 
value /, and its length is measured ||7ll8|]. Usually the unfold- 
ing of a small molecule or of a portion of a large molecule is 
viewed as the overcoming of a kinetic barrier in the molecular 
energy landscape |36]. Such a barrier is characterized by a 
width Xu along the reaction coordinate, and by a height AEu 
over the corresponding minimum. Thus, the mean unfolding 
time is expected to follow the Arrhenius law 



Toe 



(8) 



where a>o is a microscopic attempt rate and tq - 
' exp(/3A£'„) is the mean unfolding time at zero force. Note 
that Eq. ([8]l is well defined as long as / < A/Su/Xy, i.e. as long 
as the process can be actually considered as a jump process 
over an energy barrier. For / > ^.E^jx^ one expects that the 
mean escape time Tu is independent of the external force but 
rather depends on the microscopic details of the system. On 
the other hand for too small forces the system will not unfold, 
if the barrier AE^ is sufficiently high (JSAE^ » 1). 

In order to check whether the unfolding time under constant 
force obeys such a law, and to extract the kinetic parameter x„, 
we run MC simulations to mimic the unfolding of molecules 
subject to a force clamp with force /, at time f = 0. We 
consider a molecule as unfolded as soon as its length takes 
the value L„ - L„ax/'^, where L„,ax is the molecule length in 
the completely unfolded state. For each molecule 1000 in- 
dependent unfolding trajectories are considered. In fig. |5] 
the mean unfolding time of the four molecules is plotted as 
a function of the force. In the case of ITIT, the force does 
not extend to the small force range, since we find that the un- 
folding time T„ goes to infinity in the very small force regime, 
i.e., the molecules do not unfold at all. Inspection of this fig- 
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FIG. 5: Mean unfolding time as a function of the force, alT = 300 
K, for four different molecules: boxes 1I6C, triangles IBBL, circles 
ICOA, inset: ITIT. 

ure suggests that the mean unfolding time as a function of the 



force follows eq. dSJ in a wide range of force values, and then 
saturates in the large force regime, as found in other works 
lfTll[T2ll . The force value separating the two regimes depends 
on the particular molecule. From fits to eq. ([8]l the values of 
the kinetic parameter x„ can be obtained for each molecule. 
In table the values of the unfolding length are listed for the 
molecules considered in this paper In principle, from such a 
fit procedure one is also able to obtain an estimate of tq. How- 
ever the quantity tq cannot be expressed in seconds, since this 
would require to evaluate the molecular time scale to. This, 
in turn, requires an experimental estimate of tq with the force 
clamp technique. It is worth noting that, despite the simplicity 
of our model, the unfolding length for ITIT is in remarkable 
agreement with the experimental value x^ - 2.5 A [3]. 



Molecule 


Xu (A) 


IBBL 


14.8 ±0.4 


1I6C 


20.3 ± 0.5 


ICOA 


20 ±2 


ITIT 


3.13 ±0.1 



TABLE II: Unfolding length x„ as given from linear fits to eq. {8}, 
for the molecules considered in this paper. 



The unfolding of the molecule is a stochastic process, and 
thus the unfolding time varies for each realization of the pro- 
cess. It is thus interesting to study the distribution of the un- 
folding time. In figure |6] we plot histograms of the unfold- 
ing time of the ITIT molecule at T = 300 K, for small and 
large force. In the large force regime a lognormal distribu- 
tion was proposed in [12], without a theoretical justification, 
and shown to fit quite well the results of molecular dynamics 
simulations. Our results are also well fitted by a lognormal 
distribution. Nevertheless, we find it interesting to look for a 
theoretical distribution for the large force regime. 

In the large force regime, one can imagine the chain as 
made up of a number M < N of stretches which can be easily 
turned by the force from the antiparallel to the parallel (with 
respect to the force itself) configuration. At the beginning of 
the pulling process, these stretches will be randomly oriented, 
half of them antiparallel and half parallel to the force. With 
a frequency a stretch will be selected at random by the 
kinetics and, if antiparallel to the force, turned parallel with 
probability 1 . Therefore, after a time t„ = kzi , the probability 
that the chain will be completely elongated in the force direc- 
tion will be p(M,k) = [1 - (1 - l/M)*]*'/2. The probabiHty 
distribution of the unfolding time will therefore be approxi- 
mated by /(tj, = kri) - p(M, k) - p{M, k-1), which for large 
M can be approximated as 



/(t„ = feTi) = 1/2 exp[-k/M - M/2 exp{-k/M)]. (9) 

As can be seen in Fig.|6] this formula fits reasonably well our 
data, though not as well as a lognormal. Similar results are 
obtained for the other molecules (data not shown). 
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50 100 150 200 250 300 350 
Tu (to) 

FIG. 6: Histograms of the unfolding time t„ for the ITIT molecule in 
a force clamp, at T = 300 K with force / = 130 pN (main figure) and 
/ = 48 pN (inset). The lines in the main figure are fits of the data to 
a log-normal function (full line) and to the function l|9) (line-points). 
The line in the inset corresponds to a fit to a negative exponential 
function. 



A. The Kramers problem 

We now ask the following question: is the free energy land- 
scape F(L) the "kinetic" potential associated to the diffusive 
motion of the system along the coordinate L? In other words, 
in the zero force regime, is the thermal motion of the molecule 
a diffusion process across the potential F(){L)7 A positive 
answer to this question would imply that the mean first pas- 
sage time of the system to a given value L* of the elongation 
should be given by the solution of the Kramers problem. The 
Kramers problem amounts to evaluating the mean first pas- 
sage time (mfpt) of a Brownian particle, moving in an energy 
potential U(x), to a given point Xf of the potential. If we let 
jco be the initial position of the particle, the mfpt from xq to Xf 
reads l|37tl 



(10) 



where D is the diff'usion coefficient, which basically sets the 
time scale of the process. 

In figure EJa), we plot the mean first passage time of the 
1I6C molecule at L* = L,„axl'^, as a function of /, for dif- 
ferent temperatures, as obtained by eq. ( fTOl l, where Uix) has 
been replaced by the free energy Fq{L) - f ■ L, as given by 
eq. (|7]l- Inspection of this figure suggests that in the small 
force range, the unfolding kinetics of the protein is actually 
described by the equilibrium free energy landscape F{L, f). 
The agreement improves as the temperature decreases. The 
mean first passage time of the molecule is not recovered by 
eq. ( [Tol l in the large force regime, because the energy differ- 
ence F(L*, f) - F(L - Q,f) becomes negative, and thus the 
motion towards the new minimum at large L becomes purely 
diffusive, see fig.|3b). 

The same behaviour is found for the IBBL molecule (data 
not shown). Since the unfolding time grows exponentially 
with the size of the molecule, for the two larger molecules 
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FIG. 7: Panel (a): Mean first passage time at L* = L,„„i/2 for the 
1I6C molecule and for different temperatures (in K), as obtained by 
direct computer simulations (points) and by eq. HOi (lines). Panel 
(b): Free energy landscape F as a function of the molecule length L, 
at r = 300 K, for different values of the force (in pN). Inset: Free 
energy landscape F at T = 1 8 1 K. 



we were not able to observe the unfolding in the small force 
regime within reasonable computation times. 



V. DYNAMIC LOADING 

In this section we consider the followingmanipulation 
strategy, which is often used in experiments!^ [H 0, HI |^: 
starting from equilibrium configurations, a time-dependent 
force is applied to our model molecule and the unfolding time 
is sampled. As in the case of the force clamp, we define 
the unfolding time as the first passage time of the molecule 
length across the threshold value L^. Here the force is taken 
to increase linearly with time, with a rate r. this manipula- 
tion scheme corresponds to the force-ramp experimental set- 
up discussed in f6\. Thus, the rupture force fi, is given by 

fl, = '•Tu- 

As discussed above, the breaking of a molecular linkage 
is typically described as a thermally activated escape process 
from a bound state over a barrier which dominates the kinet- 
ics. It can be shown that, if the energy barrier AE„ is large 
(compared to the thermal energy ksT) and rebinding is negli- 
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gible, the U^pical unbinding force of a single molecular bond 
reads EROM 



r = ^in 



rXuTQ 



(11) 



In fig.[8ja) the typical unbinding force /* (the most probable 
value of /„) is plotted as a function of the pulling velocity for 
the ITIT molecule, for three values of the temperature. In- 
spection of this figure suggests that the range of values of r, 
where /* is a linear function of In r, depends on the temper- 
ature: the smaller is T, the wider is this range. We find that, 
for this molecule, the value of the unfolding length, x„ ^ 3 A, 
obtained by fitting the data in the linear regime to eq. ( fTTT i. is 
independent of the temperature, as expected (see caption of 
the figure for the numerical values). Note that the value of the 
unfolding length, x„, in the linear regime defined by eq. (fTTT i. 
agrees with that found with the force clamp manipulation, and 
with the experimental value Xu = 2.5 A found in ref. iU. 
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FIG. 8: Panel (a): plot of the typical unbinding force /* of the ITIT 
molecule as a function of the pulling velocity r, for the three values 
of the temperature. The lines are fits to the data in the linear regime 
defined by eq. i ll It . From such fits one can obtain the characteristic 
unfolding length j„. We find x„ = 3.1 ± 0.1 A for T = 175 K, 
x„ = 3.0 ± 0.1 A for r = 262 K, and x„ = 3.05 ± 0.2 A for T = 300 
K. Panel (b): the lines are fits of the data to eq. (fT2l i. 

By repeating the same procedure for the other molecules, 
we estimate their unfolding lengths, the results are listed in 



Molecule 


xu (A) 


IBBL 


14.0 ±0.3 


1I6C 


17.7 ±0.5 


ICOA 


22.4 ± 0.5 


ITIT 


3.05 ± 0.20 



TABLE III: Unfolding length x„, as given from linear fits to eq. i ll It . 
for the molecules considered in this paper. 



table Hn] Again, tq cannot be directly compared with experi- 
mental values, but we see that the unfolding lengths are com- 
parable to the ones obtained by the force clamp protocol. 

However, as discussed above the rupture force is not a linear 
function of r in the whole range of the pulling rate: figure [8] 
rather suggests that the slope of /* increases as r increases. 
In refs. llssl [39l l40ll it has been argued that the appearance of 
diff'erent slopes in the /* vs. In r plot could be the signature 
of the presence of different escape paths from the folded state. 
Each of these diff'erent paths would be selected by pulling the 
molecule with a given rate r. Thus, the different slopes in the 
/* vs. In r plot correspond to different characteristic lengths 
x„ of the paths. On the other hand, in a recent work ll42ll . 
considering particular choices of the energy landscape which 
make exact computations feasible, it has been argued that, the 
typical unbinding force /* has a more complex expression 



(12) 



where the exponent v depends on the microscopic details of 
the energy landscape, and y is the Euler-Mascheroni constant 
y - 0.577. Equation (fT2] i reduces to eq. (fTTT ) in the limit 
AEi, — > oo, or when the exponent v takes the value I ll42ll . 
A similar expression for the rupture force was previously pro- 
posed in 14311 , with v - 2/3. In fig.[8]^b), the fits of the typi- 
cal unbinding force data to eq. ( fT2] i are plotted for the ITIT 
molecule. The fits turn out to be rather good, but statisti- 
cal errors are quite large. In order to reduce them, for each 
molecule, we considered sets of (r, /*) data at different tem- 
peratures, and we made joint fits according to Eq. (fT2b . The 
values of the unfolding lengths, energy barriers and exponents 
obtained by these fits are listed in table |IV] Comparison of 



Molecule 


x„ (A) 


AE,, {keT, T = 300K) 


V 


IBBL 


22 ± 1 


10.0 ±0.2 


0.61 ±0.03 


1I6C 


18 ± 1 


13 ±2 


0.7 ± 1 


ICOA 


31.5±4 


16 ± 1 


0.625 ± 0.06 


ITIT 


11.5±2 


18 ± 1 


0.42 ± 0.04 



TABLE IV: Unfolding length x„, barrier height Aii„, and characteris- 
tic exponent v as given from fits of the unfolding data with dynamic 
loading technique to eq. i ll 2b . 

the unfolding lengths listed in table |IV] and in table |lll] indi- 
cates that the values of the x„ obtained by fitting the typical 
unbinding force to eq. ( fT2l i are rather different from the val- 
ues obtained by fitting the same data to eq. ( fTTT i. although, as 
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discussed before, the fit to eq. ( fTTT i is restricted to the inter- 
mediate range of values of r. This is in agreement with refs. 
||42[ I43I1 . In those references it was argued that eq. (fTZt de- 
scribes the rupture force in the whole regime of r rather than 
just in the linear regime, as eq. (fTTT l does. It is worth noting 
that the values of the exponent v found for the IBBL, 1I6C 
and ICOA molecules (table|IV|i are compatible with the value 
V - Ijl) found in refs. |42, 43] for a particular choice of the 
energy landscape. Furthermore, the value of the kinetic bar- 
rier A£„ = 18 + 1 /tijr (at r = 300 K) found for the ITIT 
molecule, is of the same order of magnitude of the experi- 
mental value ~ 37.3kBT found in |3]. 

Similarly to the case of the force clamp, the distribution of 
the unbinding force exhibits a nontrivial dependence on the 
system kinetic parameters ll4lll : 



P{ f) = — e^/-*"" exp 
Tor 



(e^A, - 1) 



(13) 



The maximum of this distribution corresponds to the typical 
unbinding force, eq. ( fTTT ). 

In figure|9l the distribution of the unfolding force is plotted 
for the ITIT molecule and different rate values. Inspection 
of fig. |9] suggests that the observed distribution of unfolding 
force agrees nicely with the expected one. Similar results are 
obtained for the other molecules. 

In order to characterize the different unfolding paths occur- 
ring for different values of r, one can assume that the unfold- 
ing length Xu is a function of the pulling rate r, and exploit 
eq. ( fT3] ) to extract the value Jc„(r), so as to estimate the unfold- 
ing length at any value of r. In figure [10] the unfolding length 
Xu of the ICOA and ITIT molecules are plotted as a function 
of r, as obtained from this fitting procedure. As expected we 
find that Xu is a decreasing function of r. Thus, the values 
Xu = 22.4 A(ICOA) and Xu = 3.05 A(ITIT), obtained by fit- 
ting the typical unbinding force to eq. ( fTTI) (see, tablellllb. turn 
out to be weighted averages of the values Xu(r) as plotted in 

fig-iini 



VI. EVALUATING THE FREE ENERGY LANDSCAPE 
FROM PULLING EXPERIMENTS 

Given that we can compute exactly the free energy land- 
scape Fq{L) of our model, let us now address the problem of 
evaluating this landscape through the force manipulation ex- 
periments. As discussed above, applying a dynamic loading is 
equivalent to drive the system out of equilibrium by coupling 
it to the external potential U (L, t) - -f(t)L. 

The free energy landscape can be evaluated by using a fluc- 
tuation relation, which is an extended form of the Jarzynski 
equaUty HRll^. 



{6{L-L{[mk],[cTij]))t-l''^)^^t 



-fi{Fo(L)-f(t)L) 



/Zo, (14) 



where Zq = J dLZo(L), with Zo(L) as given by eq. (|6]l. Com- 
bining the Jarzynski equality 1 17], and the weighted histogram 
method ]44], it can be shown that, if the molecule is driven out 
of equilibrium by a time-dependent external potential coupled 
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FIG. 9: Distribution of unfolding force under dynamical loading for 
the ITIT molecule for (a) r = 0.06 pN/fo and (b) r = 0.15 pN/fo, with 
T = 300 K. The dotted line is a fit to eq. i fOl l. 



to its len gth U(L , /), the free energy F as a function of L is 
given by MB 



FiL) = -kgT In 



{S(L-U{mt),{a-ij}))exp{-j3W,))^ 
(exp(-/SW,)), 



exp(-t/(L.O) 
(exp(-/SW,)), 



(15) 



where W, is the thermodynamic work done on the system 
by the external potential, up to the time f, defined as W, = 

dt'd'H/dt', and the average (. . .), is over all the trajectories 
of fixed duration f. In an experimental situation, the work W, 
is not sampled continuously, but at successive discrete times 
0, At, 2At, . . . , MAt. Therefore the sum over t in Eq. (flSl l runs 
over these discrete values. 

The estimated free energy for the smaller molecules are 
plotted in fig. (TT] as obtained by 10000 independent pulling 
trajectories, together with the exact ones. As expected ]14, 
[l9ll . the curves obtained by numerical "experiments" collapse 
onto the expected one as the pulling rate r decreases. 

Within the present scheme, it was not possible to evalu- 
ate the free energy landscape of the two larger molecules: in- 
deed the work needed to completely unfold the two molecules 
(ICOA, ITIT) amounts to some hundreds of ksT, and thus 
the numerical precision in evaluating the average value of 
exp(-y6W) is rather scanty. 
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FIG. 10: Unfolding length x„ as a function of r, as obtained from 
fits of the unfolding force probability distribution to eq. l ll3t with 
T = 300 K, for the ICOA (a) and ITIT (b) molecules. 
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FIG. 11: Reconstructed free energy landscape F of the PlNl (a), and 
the IBBL (b) for T = 300 K. The lines correspond to the exact result. 



Our results suggest a practical procedure to estimate the 
free energy landscape of real proteins with dynamic loading 
experiments. The work done on the molecule has to be sam- 
pled for different pulling rates: for each rate eq. (flSl l pro- 
vides an estimate of the target function Fr{L). As the rate r 
decreases, the curves are expected to superimpose more and 
more: when the difference between the curves is of the order 
of few hgT, for the whole range of L, the estimate of F(L) can 
be considered reliable within this small uncertainty. 

VII. CONCLUSIONS 

In this paper we presented a comparative study of four 
widely studied proteins, by exploiting a simple model recently 
introduced. Such a model allows one to obtain analytically the 
equilibrium properties of the molecules considered. By using 
MC simulations we also study the unfolding kinetics of the 
molecules as they are pulled through an external force applied 
to their free ends. As already discussed in |14], the model 
turns out to exhibit the typical behaviour of a protein under- 
going a mechanical manipulation, both with the force-clamp 
scheme and with the dynamic-loading scheme. 

We systematically study the equilibrium free energy land- 
scape of the model molecule, as a function of an experimen- 



tally accessible coordinate, namely the molecular elongation. 
By comparing the unfolding time with the expected values, as 
given by the Kramers' formula, we find that, in the limit of 
small forces, the free energy landscape (|7|i represents the ki- 
netic energy landscape which governs the unfolding kinetics 
of the molecules, as discussed in ref. ll4Tll . Furthermore, by 
comparing the typical width of the energy barrier, as found 
from eq. ^ and listed in tableU with the unfolding length ob- 
tained from out-of-equilibrium unfolding experiments, listed 
in tables IIII4III1 we conclude that the latter parameters are not 
related to any typical length in the molecules, but are rather 
effective parameters. 

By considering pulling at different velocities, we found that 
the unfolding length varies with the pulling velocity: thus our 
results support the point of view that different escape paths 
exist for a given molecule, each path being selected by the 
features of the manipulation. 

Finally, the possibility of computing the free energy land- 
scape as an exact result, make our model an excellent test 
bed to check the application of the fluctuation relations to the 
study of the equilibrium properties of biomolecules. Our re- 
sults indicates that the free energy landscape can be recovered 
by out-of-equilibrium manipulations, and the collapse of the 
reconstructed curve represents an effective criterion to evalu- 
ate the reliability of the results. 
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In conclusion, we believe that our model is an useful tool 
to investigate the mechanical unfolding of proteins. We plan 
to further extend our work by studying the single folding and 
unfolding paths of proteins, by analyzing the unfolding kinet- 
ics of single substructures, so as to compare the results with 
experimental results. 
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APPENDIX A: EVALUATING THE FREE ENERGY 
LANDSCAPE FROM THE EQUILIBRIUM EXACT 
SOLUTION 

In this appendix we discuss how the summation in the par- 
tition function ^ can be performed exactly, so as to obtain 
the free energy landscape as a function of the molecule length 
(|7]i. A key point in our approach is to specify a (finite) dis- 
crete set of values for the length of a molecule. This is of 
course not a dangerous assumption, since atomic coordinates 
in the pdb are given with a finite resolution and it is there- 
fore fair to round the distances Ijj (measured in A) to rational 
numbers with a finite number of digits. In the applications re- 
ported in the present paper we use the resolution 10"^ A. For 
the sake of simplicity, in the following we will adopt a length 
unit such that the distances Ijj are integer numbers. Notice also 
that their absolute value is not greater than Lmax - Zi;Io 
which corresponds to the length of the molecule in the com- 
pletely unfolded, fully stretched configuration. 

We compute the partition function (|6]l within a recursive 
scheme, considering sequences of sub-chains made of peptide 
bonds from 1 to « < A^. For the sub-chain with n bonds we 
define the interaction energy 

n—ln j 

E„{m) = - Z ^'j^'j n ^^^^ 

(=1 J=i+l k=i 



and the length 



L„{m,cr) = Z Z UjCTijSijim). 



(A2) 



i=0 j=i+\ 



Notice that Ej^[{m) - fLN(m, cr) corresponds to the Hamilto- 
nian Q. We will also need the reduced partition function 

S„(/) = X Z exp[-y6£„(m) +yS/L„(m, cr)], (A3) 

m a 

where the first summation is over the first n peptide bond vari- 
ables and the second one is over the orientations of the corre- 
sponding stretches. 

Since the length values are integer by choice, we can ex- 
pand S„(/) in powers of as 



S„(/)= 2 Z„(L)e^-'^ 



(A4) 



Our goal is thus the evaluation of 'Z.n(P), which corresponds 
to the partition function Zo(L) (|6]l in the main text. 

Using the identity 



1 = 1- m„ H- ^(1 - m,_i) Y\ m, (A5) 
i=l k=i 

one can verify the recursive relations 

H„(/) = 2 J] cosh(^//,_i„+i)A;,(/), (A6) 



(=1 



and 



exp [y8 Z"k=i efaAfa] A;,_j(/) if / < n; 



H„_i(/) if / = n H- 1. 
where, for / e { 1 , . . . , n + 1 ), 



(A7) 



■ exp[-/3E„(m)+/3fL„(m,cr)]. (A8) 

where the sums over m and cr run as in eq. (IA3b . In the case 
« = we set Ho(/) = 2 cosh(j6/Zii) and A^(/) = 1. 

Expanding AJ, in powers of e^-' as 



(A9) 



we see that the coefficients of the above expansion satisfy 



exp [j3 EL,- ffaiAto] a'^^i(L) if i < n; 



(AlO) 



Z,n-\{J-) '\fi-n+ 1, 
with the initial condition a^iL) - 6(L). In addition we have 



«+i 



Zn{L) = - + + U-ln^l)] ■ (All) 



/=1 



Notice that the parity with respect to L can be exploited to 
reduce the above scheme to positive L values. 

Finally, we want to stress the importance of the integer 
character of the microscopic lengths Uj. We do not know a 
priori the values that the length of the molecule can assume, 
so our algorithm needs to span the whole interval [0, Lmnj^], 
that has thus to be a finite set. On the other hand, the above 
scheme can be viewed as a polynomial algorithm to find the 
values that L can assume: there are no configurations (m, cr) 
with length L when Zn(L) = 0. 
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